China Science Vector
  • About
  • BCAS
  • BCAE
  1. Exploratory Data Analysis
  • CAS & Science Policy
  • Building Dataset
  • Exploratory Data Analysis
  • Building Topic Model
  • Topic Analysis

Contents

  • 1 Publication Trends Over Time
  • 2 Views & Downloads
    • 2.1 General Trend
    • 2.2 Examinig Outliers: Pioneer Initiative, Chinese Oppenheimer, Laser Imaging and Space-Weather
  • 3 Organizations
    • 3.1 General Trend: Growth of Collaboration
    • 3.2 Top Contributors
    • 3.3 Collaboration Network
    • 3.4 Geographical Distribution
  • 4 Authors
    • 4.1 General Trend: Increase in Collaboration
    • 4.2 Top Contibutors
  • 5 Fund Projects
  • 6 Keywords
    • 6.1 Article Description Keywords
    • 6.2 Jieba Tokenization
  • 7 Summary

Exploratory Data Analysis

Now that we have our clean dataset we can start Exploratory Data Analysis (EDA). Even a surface-level analysis can provide valuable insights into the BCAS and its evolution over time. The dataset we’ve constructed includes the following metadata for each article:

  • Titles and publication years
  • Authors and their affiliations
  • Abstracts and keywords
  • Views and downloads
  • Fund projects

This data allows us to explore several key questions about BCAS:

  • What are the publication trends? How have they evolved over the past 36 years?
  • How engaging are these articles? What are the typical view and download counts? Are there notable outliers?
  • How many articles are linked to funded projects?
  • What topics are covered? What are the most frequently discussed concepts?

1 Publication Trends Over Time

The data from the BCAS official website1 indicates a significant пкщцер in publication activity beginning in 2015. As all China analysts remember, 2015 also marks the release of the Made in China 2025 (中国制造2025) initiative, a pivotal industrial strategy for modern PRC. This correlation suggests that the rise in scientific publication activity may be linked to China’s strategic shift towards becoming a major industrial power, where science and technology play a crucial role.

1 We utilized data from the old version of the BCAS website due to its easier scraping process compared to the new version. The publications and data are identical on both versions, although the old site includes more comprehensive views and downloads data for older articles, and vice versa. The new version of the website can be accessed here.

Figure 1: Number of Publications by Year, 1986-2023.

The publication frequency of the journal has seen a marked increase since 2014, as evidenced by a significant rise in the number of articles. This upward trend in article count suggests that the journal has adopted a more frequent release schedule.

2 Views & Downloads

2.1 General Trend

The surge in publication volume is mirrored by similar increase in downloads and views numbers. However, a decline in these metrics has been observed since 2020. This decrease may be attributed to (1) older publications have had more time to accumulate views and (2) the launch of a new website version may have redistributed some of the audience. It is interesting that downloads significantly supersede views in 2005 and 2015 onwards, which is quite unusual. This scenario can have several implications:

  • Views and downloads are recorded with error
  • Users might use automated tools to download the articles without viewing them online (maube they prefer reading PDFs on their computers)
Figure 2: Views and Downloads Trends, 1986-2023.

If we calculate the average views and downloads per article, we find that 2013 had the highest numbers relative to the number of articles. However, since then, both views and downloads have been on a decline. This sort of decline may suggest a few possibilities:

  • The content is not as engaging as it used to be.
  • There might be more content available on similar topics, diluting the audience’s attention.
  • Changes in the hosting platform.
Figure 3: Views and Downloads per Article, 1986-2023.

2.2 Examinig Outliers: Pioneer Initiative, Chinese Oppenheimer, Laser Imaging and Space-Weather

Next, we can create a histogram to visualize the distribution of article views. The histogram reveals a generally bell-shaped distribution with a noticeable right skew. While the majority of data follows a near-normal pattern, there’s a distinct tail extending to the right, signifying the presence of outliers. The peak of the distribution shows that the most common range for article views is between 2,000 and 2,499, with 2,125 publications falling within this range. Articles surpassing 3,000 views are relatively rare, though there is a significant outlier with nearly 120,000 views!

Figure 4: Distribution of Views.

Interestingly, the most viewed article discusses the fundamental guidelines for the organization of CAS. Titled Enlightenment from Guiding Principle Adjustment and Corresponding Reform Practice of Chinese Academy of Sciences (中国科学院办院方针调整及实施“率先行动”计划等改革实践的启示), it was authored in 2020 by Zhang Xuecheng (张学成) and Zhong Shaoying (钟少颖), both of whom are a part of CAS bureaucracy affiliated with the Office of General Affairs of CAS (中国科学院办公厅). The article abstract:

This paper reviews the historical background, main contents, policy logic, and reform practice of the six guiding principle adjustments of the Chinese Academy of Sciences (CAS). Particularly, we focus on the logical starting point, main measures, and main results of the first phase of the “Pioneer Initiative” of CAS, and summarize the experience and enlightenment of the development of national strategic scientific and technological forces.

The most viewed article in the dataset offers insight into an important strategy known as the Pioneer Initiative2, for which the CAS is responsible. This initiative outlines the Chinese Academy of Sciences’ reform and development path up to 2030. The plan proposes 25 major reform measures across five key areas3:

2 “Pioneer Initiative” Action Plan (中国科学院率先行动计划) was approved on July 7, 2014 at the 7th meeting of the National Leading Group for Science and Technology System Reform and Innovation System Construction (国家科技体制改革和创新体系建设领导小组). Xi Jinping ordered CAS to implement the plan, turning the concept into reality as soon as possible and contributing to rebuilding China into a world science and technology power (世界科技强国).

3 This information comes from the Baidu Baike. The link to the original draft of the initiative is inaccissible now. China is becoming more and more secretive about its’ S&T strategy. Some additional info about the first stage of the plan can be found here and CAS webpage with basic info is here.

  • Research institute classification based on 4 categories must align with national demands and innovation goals.
  • Research should concentrate efforts on strategic national needs and global scientific frontiers.
  • China needs to build a national innovation talent base.
  • High-level think tanks with a focus on output.
  • Expand international cooperation to enhance technological services.

The first milestone, set for around 2020, aligns with the 100th anniversary of the founding of the Chinese Communist Party (CCP). At the first stage the goal was to successfully complete the tasks of Innovation 20204 and largely achieve the Four Priorities objectives. The second milestone, aimed for 2030, involves full realization of the Four Priorities goals, which will lay a strong foundation for establishing China as a world-leading scientific and technological power by the 100th anniversary of the founding of the PRC and the CAS. This progress is crucial for supporting the realization of the Chinese Dream of national rejuvenation (中华民族伟大复兴的中国梦), which plays which the Chinese nationalism.

4 The Innovation 2020 (创新2020) initiative was aimed to address strategic national challenges and drive technological innovation, with a focus on fostering talent and overcoming systemic barriers. it was set to begin pilot phases in the second half of 2010 and fully launch in 2011 The initiative is seen as a crucial step in making China a leading science and technology power by 2020 and 2030.

5 Focusing on the global scientific and technological frontier (面向世界科技前沿), addressing major national needs (面向国家重大需求), and serving the main battlefield of the national economy (面向国民经济主战场).

6 For example, one of the most frequent contributors to the BCAS, the Institutes of Science and Development (中国科学院科技战略咨询研究院), changed in 2016 from the original Institute of Science and Technology Policy and Management Science (中国科学院科技政策与管理科学研究所).

To achieve these goals, CAS must adhere to the “Three Orientations” (“三个面向”)5 on its path toward the “Four Priorities” (“四个率先”).6

Plotting a graph that displays views against downloads will allow us to easily identify outliers in both categories. This visualization helps us spot articles with unusually high or low values in terms of either views or downloads, thereby highlighting unique cases or trends within the dataset.

Figure 5: Relationship Between Views and Downloads.

Here we can see that even though Enlightenment from Guiding Principle Adjustment and Corresponding Reform Practice of Chinese Academy of Sciences has ~120k views, people do not donload this document as much. On the other hand, the most downloaded article with 87k total downloads –– The Prospect of Plant Life Science Research (植物生命科学发展趋势), 2005 –– has only 2737 views.

Examples of other notable outliers in downloads and views include:

  • Nuclear Bomb Meritorious Scientist – Academician Huang Zuqia (核弹功勋科学家——黄祖洽院士), 2005 – 42,096 downloads / 3,108 views.
  • Laser Imaging Detection and Ranging Technologies and Systems Development (激光成像雷达技术和系统研制), 2013 – 24,864 downloads / 7,645 views.
  • Space Weather Research (空间天气学研究进展), 2005 – 22,381 downloads / 6,795 views.
  • Life-span Development Theory of Human Psychology (人类心理毕生发展理论), 2012 – 3,108 downloads / 18,403 views.
  • Virtual Water—A Strategic Instrument to Achieve Water Security (虚拟水——中国水资源安全战略的新思路), 2003 – 6,795 downloads / 17,445 views.

To see the better picture of the audience engagement we can calculate how many views articles have on average. The dynamics of average views show that the most viewed articles were published in 2013, since then there has been a decrease in views up to 2023.

3 Organizations

3.1 General Trend: Growth of Collaboration

The number of unique organizations contributing to the Bulletin has increased significantly since 2016. Before that year, the annual count of contributing organizations never exceeded 100, with a low of just 15 in 2014. By 2017, the number of distinct author affiliations had risen to 200, peaking at an all-time high of 209 in 2023. This upward trend reflects an increasing diversity of scientific institutions involved with the Bulletin.

Figure 6: Number of Organizations, 1986-2023.

When examining the average number of affiliations per article, the trend is far less robust. A lower average indicates that a larger proportion of articles are produced by a smaller number of organizations. The peak occurred in 2017, with an average of 0.74 affiliations per article, meaning that 100 publications came from 74 distinct organizations. Since then, this average has declined, reaching 0.49 in 2023. This decrease suggests that fewer institutions are consistently contributing to the journal, reflecting a more concentrated involvement by a smaller group of organizations.

Figure 7: Organizations Per Article, 1986-2023.

3.2 Top Contributors

Let’s examine which organizations have the highest count of affiliations. Since organizations may change their official titles over the years7, we’ll focus on the more recent Xi Jinping era.

7 For example, one of the most frequent contributors to the BCAS, the Institutes of Science and Development (中国科学院科技战略咨询研究院), changed in 2016 from the original Institute of Science and Technology Policy and Management Science (中国科学院科技政策与管理科学研究所).

Figure 8: Top-10 Organizations by Frequency, 2013-2023.

Leading by a significant margin is the University of the Chinese Academy of Sciences (UCAS) with 414 affiliations, followed by the Chinese Academy of Sciences (CAS) with 243, and the Institutes of Science and Development. The high number of occurrences for UCAS suggests that many scientists and experts working on cutting-edge and strategic technologies are at least partially affiliated with this institution. Despite its prominence, UCAS receives far less public attention compared to Peking University (49 affiliations) or Tsinghua University (32 affiliations).

As the official overview notes, the CAS Institutes of Science and Development (CASISD)8 is:

8 The title likely should be “Institute,” but “Institutes” is how it appears on the official website.

… a research organization supporting the Academic Divisions of CAS (CASAD) to play its role as China’s highest advisory body in science and technology, and a comprehensive integration platform for CAS to build a high-level national S&T think tank.9

9 CASISD was prioritized as one of the initial pilot organizations in Xi Jinping’s efforts to establish high-caliber national S&T think tanks (“率先建成国家高水平科技智库”).

CASISD brings together top strategy and consultation research teams from other CAS institutions, such as the National Science Library10 and the CAS Institute of Geographical Sciences and Natural Resources Research. These factors make CASISD a key player in shaping China’s science and technology (S&T) policy.

10 One of China’s key organizations for S&T ‘information’ or ‘intelligence’ (情报). For more details see: William Hannas and Huey-Meei Chang, “China’s STI Operations” (Center for Security and Emerging Technology, January 2021). https://doi.org/10.51593/20200049

11 One of Xi Jinping’s earliest initiatives as Party Secretary was the concept of Ecological Civilization (生态文明). According to Xinhua, Xi had already expressed concern for the environment in 2005, when he famously stated “Lucid waters and lush mountains are mountains of gold and silver” (绿水青山就是金山银山).

Another significant group of contributors consists of environmental science organizations, including the Institute of Geographic Sciences and Natural Resources Research, the Institute of Remote Sensing and Digital Earth, the Research Center for Eco-Environmental Sciences, and the Institute of Botany. The prominence of these institutions underscores China’s strong focus on energy and ecological issues, which aligns closely with the country’s current agenda.11

3.3 Collaboration Network

To understand how these organizations are connected, we can use a network graph that illustrates the collaborations between them. A collaboration is defined by the presence of two distinct organizations mentioned in the article metadata.

From this analysis, we find that the University of CAS (Chinese Academy of Sciences) leads with the highest number of connections (216), followed by CAS itself (132) and the Institutes of Science and Development (103). The University of CAS’s prominence in the network can likely be attributed to the fact that authors often have multiple affiliations, including both their primary research facility and the educational institution where they teach. CAS’s significant number of connections is also expected, as many contributing authors are CAS academicians with multiple leadership roles. The extensive connections of the Institutes of Science and Development indicate that this think tank leverages a diverse range of expertise from across the S&T fields.

Other noteworthy organizations with high numbers of connections include the Institute of Remote Sensing and Digital Earth (102), the Institute of Geographic Sciences and Natural Resources Research (100), the Institute of Atmospheric Physics (78), and the Northwest Institute of Eco-Environment and Resources (71). A high number of connections among these institutions suggests several key insights:

  • Higher Research Complexity: The involvement of more people from different organizations points to the complexity of the research projects.
  • Interdisciplinary Collaboration: A significant number of connections likely reflects a high degree of interdisciplinarity in the research.
  • Consultation and Policy Influence: The collaboration with the Institutes of Science and Development suggests that these organizations are involved in shaping China’s S&T policy, particularly in areas of remote sensing, geographic sciences, and environmental research, indicating strategic focus in these vectors.
Figure 9: Collaboration Network of Organizations, 2013-2023.

The total number of organizations with at least one connection between 2013 and 2023 is 661, but the network density is only 1.08%. This low network density indicates that these organizations are largely divided into relatively small clusters, each connected by specific research topics. This suggests that collaboration tends to be more specialized and focused within certain areas rather than broadly interconnected across the entire network.

Examining the yearly collaborations reveals intriguing dynamics in network density. The highest density occurred in 2013 at 25.71%, but this involved only 15 organizations, indicating a tightly knit network. Interestingly, 2014 saw no recorded collaborations, while in 2015, the number of participating institutions tripled to 49 compared to 2013. From 2016 onward, the number of collaborating organizations steadily increased, while the density dropped to between 3.94% and 2.49%. This decline suggests the formation of distinct topic clusters, reflecting a diversification in the journal’s scope.

The most extensive networks emerged in 2017, with 182 organizations, and in 2023, with 198, highlighting the growing breadth of collaboration over time.

Figure 10: Collaboration Network of Organizations by Year, 2013-2023.

3.4 Geographical Distribution

Examining the geographical distribution of organizations from 2013 to 2023, it’s clear that Beijing leads by a significant margin, with 306 organizations based there. In comparison, Shanghai and Nanjing trail far behind, with only 45 and 24 organizations respectively. Beijing also overwhelmingly dominates in total mentions, with 2,029 citations.

Beijing’s dominance is expected for the nation’s capital, particularly in policy-advising publications. What’s more intriguing are the cities that follow Beijing in this ranking.

Figure 11: Comparison of Total Affiliations and Unique Organizations Across Major Cities, 2013-2023.

If we exclude Beijing, the regional differences become less pronounced. It’s evident that Shanghai, Wuhan, Nanjing, Qingdao, and Lanzhou emerge as some of the top contributors.

Figure 12: Total Affiliations and Unique Organizations Across Major Cities excluding Beijing, 2013-2023.
Figure 13: Dynamics of Cities Contributions, 2013-2023.

This interactive map highlights the geographical distribution of organizations contributing to the BCAS, providing a visual representation of their spread across various regions.

Figure 14: Organizational Affiliations by City, 2013-2023.

4 Authors

4.1 General Trend: Increase in Collaboration

How many scientists publish here their research and advice on the strategic matters?

The graph indicates a surge in numbers of unique authors12 starting from 2016 with 531 authors. The number of authors continues to increase, reaching its peak in 2023 with 785 experts.

12 Excluding the articles were the author data is missing.

Figure 15: Total Number of Authors by Year, 1986-2023.

The growth in total numbers can be attributed to the expansion of the journal itself, making it sensible to consider the average number of authors per publication. Nonetheless, a significant increase in these figures is evident between 2016 and 2023. The numbers peaked in 2016 at 2.87 (indicating 287 experts per 100 articles), then dropped to 1.95 in 2020, but rebounded to 2.66 in 2023.

These statistics suggest that collaboration in research is increasing, indicating several trends:

  • The complexity of research is growing over the years.
  • Research is becoming more interdisciplinary.
  • Research is becoming more specialized and focused.
Figure 16: Average Number of Authors per Article, 1986-2023.

4.2 Top Contibutors

Our next step is to identify the key contributors to the BCAS. To gauge their impact, we will use several metrics:

  • Number of articles authored
  • Total cumulative views of their articles
  • Average views per article

By ranking the BCAS contributors based on the number of articles published, we can gain the following insights:

Author Articles Bio
Bai Chunli (白春礼) 36 Chemist and nanotechnology expert, an academician of the CAS, and a member of the Academy of Sciences for the Developing World, the National Academy of Sciences of the United States, the Royal Society of London, the American Academy of Arts and Sciences, the European Academy of Sciences, the Russian Academy of Sciences and other countries and regions or foreign academicians. He is the first chairman of the “Belt and Road” International Science Organization Alliance, the honorary chairman of the presidium of the CAS, and the honorary president of the University of CAS and the University of Science and Technology of China.
Pan Jiaofeng (潘教峰) 27 Graduated from the Department of Computer Science and Engineering of the Chinese University of Hong Kong with a Master’s degree. He is a researcher and doctoral supervisor. Currently serving as a representative of the 14th National People’s Congress, President of the Institute of Science and Technology Strategy Consulting of CAS, and Director of the Institute of Policy and Management of CAS.
Guo Huadong (郭华东) 21 Born on October 6, 1950, in Feng County, Jiangsu, Guo Huadong is a geoscientist and an academician of CAS. He is also an academician of the Academy of Sciences for the Developing World and a foreign academician of the Russian Academy of Sciences. Guo is a Fellow of the International Council for Science and serves as a researcher and doctoral supervisor at the Institute of Remote Sensing and Digital Earth, CAS. He is the Director of the International Research Center for Sustainable Development Big Data and the Honorary President of the International Society for Digital Earth.
Fan Jie (樊杰) 20 Born in March 1961, Fan Jie is Vice President at the Institutes of Science and Development, CAS, and serves as the Director of the CAS Center for Sustainable Development and the Key Laboratory of Regional Sustainable Development Analysis and Simulation. Fan also chairs the Policy Research and Planning Expert Group at the China International Engineering Consulting Corporation and is a member of the National Planning Expert Committee. He graduated from Peking University’s Department of Geography and Urban and Regional Planning in 1982 and has been associated with the Institute of Geographic Sciences and Natural Resources Research, CAS, throughout his career. In June 2022, he was elected a Fellow of the Chinese Geographical Society. Fan has also served as a member of the 13th and 14th National Committee of the Chinese People’s Political Consultative Conference (CPPCC) and as a member of its Education, Science, Culture, Health, and Sports Committee.
Wang Hongsheng (王竑晟) 13 Affiliated with the Bureau of Science and Technology for Development; previously worked at the Institute of Zoology.

When ranking authors by total views, the order shifts significantly. Zhong Shaoying and Zhang Xuecheng, both officials within the CAS system, top the list, primarily due to their authorship of the highest views outlier discussed in the Views & Downloads Section. Bai Chunli moves to third place, and Fan Jie drops to fifth. Guo Huadong, affiliated with the International Research Center of Big Data for Sustainable Development Goals, the Aerospace Information Research Institute, and the University of Chinese Academy of Sciences, occupies the fourth position.

Author Total Views Affiliation
Zhong Shaoying (钟少颖) 126,378 Office of General Affairs of CAS (中国科学院办公厅)
Zhang Xuecheng (张学成) 119,748 Office of General Affairs of CAS (中国科学院办公厅)
Bai Chunli (白春礼) 103,603 Chinese Academy of Sciences (中国科学院)
Guo Huadong (郭华东) 84,128 International Research Center of Big Data for Sustainable Development Goals (可持续发展大数据国际研究中心)
Aerospace Information Research Institute, CAS (中国科学院空天信息创新研究院)
University of Chinese Academy of Sciences (中国科学院大学)
Fan Jie (樊杰) 70,560 Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences (中国科学院地理科学与资源研究所)
Institutes of Science and Development, Chinese Academy of Sciences (中国科学院科技战略咨询研究院)
College of Resources and Environment, University of Chinese Academy of Sciences (中国科学院大学资源与环境学院)

5 Fund Projects

Another crucial insight pertains to the financing of scientific research. Some of the studies published were conducted under funded projects. These projects are primarily applied research initiatives supported by grant funding13, which provides the necessary financial backing.

13 The National Natural Science Foundation of China (自然科学基金项目, NSFC) plays a very important role here. NSFC operates nationwide and is a crucial component of the national innovation system. It primarily funds basic research in natural sciences and some applied research. The NSFC is responsible for the implementation and management of these funds. 2024 Project Guide (2024项目指南) describes various kinds of fund projects:

  • General Projects (面上项目)
  • Young Scientist Fund Projects (青年科学基金项目)
  • Regional Science Fund Projects (地区科学基金项目)
  • Key Projects (重点项目)
  • Big Projects (重大项目) and more

We can visualize the counts of funded projects and observe that, compared to previous periods, there was a substantial increase in numbers between 2016 and 2023.

Figure 17: Number of Fund Projects, 2013-2023.

However, it’s important to consider that the number of BCAS issues has also increased. Therefore, it’s more meaningful to evaluate the funded projects relative to the total number of publications. This approach reveals that the proportion of publications associated with funded projects began to rise in 2013, declined during 2018-2019, and has been increasing again since 2020.

Figure 18: Share of Articles Associated with Fund Projects, 2013-2023.

6 Keywords

Scientific articles typically include a set of keywords that describe their research focus. By analyzing these keywords, we can identify the most frequently used terms and determine the primary topics of interest for Chinese experts.

6.1 Article Description Keywords

We can begin by analyzing the keywords as they are presented in the descriptions. Given that Chinese language characters are not separated by spaces, we will initially treat these character sets as single words or concepts. For instance, the term 可持续发展 will be considered as one word, whereas in English it translates to two separate words: ‘sustainable development.’

Figure 19: Top 20 Keywords Between 2013-2023.

This analysis reveals that the most frequently mentioned keyword is CAS (中国科学院), indicating a strong focus on this organization within the articles. When combined with terms such as ‘basic research’ (基础研究), ‘scientist’ (科学家), ‘institute’ (研究所), ‘CAS academician’ (中国科学院院士), ‘department member’ (学部委员), and ‘department of CAS’ (中国科学院学部), it is evident that a major topic in BCAS centers around the internal workings of CAS, its organizational structure, and key figures within it.

Additionally, keywords like ‘sustainable development’ (可持续发展), ‘suggestion’ (建议), ‘climate change’ (气候变化), ‘key national laboratory’ (国家重点实验室), and ‘measure’ (对策) highlight a focus on science and technology policy and environmental issues.

Having established the broader trends, we should now do a more detailed analysis by examining the top keywords by year.

Figure 20: Top-10 Keywords by Year, 2013-2023.

Ranking keywords by their share of all keywords from 2013 to 2023 reveals several noteworthy trends.

In 2013, the leading concept was ‘ecological civilization’ (生态文明), which accounted for 3.39% of all keywords. A closely related term, ‘building of ecological civilization,’ ranked third at 1.21%. This reflects the influence of Xi Jinping, who became General Secretary in 2012, indicating a strong alignment between official political discourse and the focus of BCAS publications.

The emergence of discourse on S&T (science and technology) superiority is evident in the following keywords:

  • 2016: ‘Intellectual property powerhouse’ (知识产权强国) - 0.67%
  • 2017 and 2018: ‘Global S&T superpower’ (世界科技强国) and ‘S&T superpower’ (科技强国) - 0.86% and 0.57% respectively
  • 2019: ‘S&T superpower’ (科技强国) - 1.13%
  • 2022: ‘S&T superpower’ (科技强国) - 0.65%

Further evidence of policy influence is the rise in mentions of terms related to carbon neutrality, a significant initiative in China. ‘Carbon neutrality’ (碳中和) emerged as the top keyword in both 2021 and 2022, alongside the ‘dual carbon’ goal (’双碳’目标), and continued to feature prominently in 2023 publications.

The Belt & Road Initiative also appears frequently, topping the list in 2016, 2017, 2021, and 2023.

6.2 Jieba Tokenization

Another way to analyze the keyword descriptions is by tokenizing them—breaking them down into smaller components while retaining their semantic meaning. In the case of Chinese text, the jieba library is an effective tool for tokenization.

Here is an example of the result of tokenization:

Original Tokenized
碳达峰,碳中和,油气安全,关系,路径,战略 碳达峰 碳 中 和 油气 安全 关系 路径 战略
能源安全,煤炭智能精准开采,清洁高效利用,碳中和科学发展 能源安全 煤炭 智能 精准 开采 清洁 高效 利用 碳中 和 科学 发展
能源安全,高质量发展,综合能源保障体系,全方位安全观,能源与矿业治理 能源安全 高质量 发展 综合 能源 保障体系 全方位 安全观 能源 与 矿业 治理
页岩油,能源安全,开发利用,能源体系,政策建议,中国 页岩 油 能源安全 开发利用 能源 体系 政策 建议 中国
碳达峰,碳中和,碳中和学,新能源,能源转型,能源独立,碳中和社会 碳达峰 碳 中 和 碳 中和学 新能源 能源 转型 能源 独立 碳中 和 社会

If we rank the tokens by frequency, we can gain a clearer understanding of the major topics covered in BCAS. The most common terms include: CAS, research, development, S&T, innovation, science, technology, institution, country, China, strategy, ecology, cooperation, international, engineering, academician, biology, laboratory, work, and basic. The distribution of these terms is now more smooth and balanced.

Figure 21: Top 20 Tokens in BCAS Article Keywords Between 2013-2023.
Figure 22: Top-10 Tokens by Year, 2013-2023.
jieba_word_counts_stats.head(10)
keyword count share cumulative_share
0 中国科学院 991 2.219833 2.219833
1 研究 823 1.843514 4.063347
2 发展 728 1.630715 5.694062
3 科技 647 1.449275 7.143337
4 创新 535 1.198396 8.341733
5 科学 534 1.196156 9.537889
6 技术 417 0.934077 10.471966
7 研究所 342 0.766078 11.238044
8 国家 314 0.703358 11.941402
9 中国 277 0.620478 12.561880

When we plot the cumulative share of words by their index, the resulting graph exhibits a classic Pareto distribution. It means that a relatively small number of tokens is responsible for a large percent of word occurencies.

Figure 23: Cumulative Distribution of Tokens, 2013-2023.

Indeed, calculations reveal that 21.48% of the tokens account for approximately 80% of the words. This suggests that certain themes are heavily emphasized in the articles, indicating their importance or centrality within the BCAS research. The distribution is highly skewed, with a “long tail” of less frequent keywords, which means that while a few keywords are very common, many others are rare. The frequency distribution suggests the diversity of themes in the journal but also tells us that only a few are repeatedly addressed. It suggests that the BCAS has a few highly recurring themes, with the rest of the topics being more specialized or niche.

Figure 24: Distribution of Token Frequencies.

7 Summary

We found some intriguing patterns in the BCAS (Bulletin of the Chinese Academy of Sciences) data. We can draw the following conclusions:

  • The BCAS not only grows in size, but the number of experts and organizations contributing to the journal’s expertise also increases significantly. It’s also evident that more people collaborate on papers, suggesting the complexity and interdisciplinarity of research is on the rise.
  • Examining outliers in views and downloads helped us gain insights into China’s science and technology strategies, as the most viewed article was written on this matter. This also revealed which particular experts’ works are most engaging to the public — both academicians and science bureaucrats. Interestingly, those with the most views do not necessarily have the most publications.
  • Exploring the contributing organizations, we found the top contributors to the BCAS between 2013-2023 are the Chinese Academy of Sciences (CAS) and its affiliated university, as well as the Institute of Science and Development (a S&T policy think tank) and research institutions working on environmental issues. While the majority of publications come from Beijing, other cities like Shanghai, Wuhan, Nanjing, and Lanzhou also contribute articles.
  • Analyzing the count of funded research projects over the years shows that the share of funded research represented in the BCAS is increasing. We can infer that the proportion of applied and natural science research is growing.
  • The organizations and keywords frequencies suggest a connection with major government policies and Xi Jinping’s initiatives.

Next, we need to gain a deeper undestanding of the contents of the BCAS publications. There are around 7,000 articles in the dataset, and we cannot thoroughly read all of them. However, we can gain some understanding of this data through “distant reading” techniques, particularly topic modeling.

Back to top
Building Dataset
Building Topic Model
 

By Dzmitry Mazanik, 2024